Goto

Collaborating Authors

 sub-gaussian random variable



Bias-variance Tradeoff in Tensor Estimation

arXiv.org Machine Learning

We study denoising of a third-order tensor when the ground-truth tensor is not necessarily Tucker low-rank. Specifically, we observe $$ Y=X^\ast+Z\in \mathbb{R}^{p_{1} \times p_{2} \times p_{3}}, $$ where $X^\ast$ is the ground-truth tensor, and $Z$ is the noise tensor. We propose a simple variant of the higher-order tensor SVD estimator $\widetilde{X}$. We show that uniformly over all user-specified Tucker ranks $(r_{1},r_{2},r_{3})$, $$ \| \widetilde{X} - X^* \|_{ \mathrm{F}}^2 = O \Big( ฮบ^2 \Big\{ r_{1}r_{2}r_{3}+\sum_{k=1}^{3} p_{k} r_{k} \Big\} \; + \; ฮพ_{(r_{1},r_{2},r_{3})}^2\Big) \quad \text{ with high probability.} $$ Here, the bias term $ฮพ_{(r_1,r_2,r_3)}$ corresponds to the best achievable approximation error of $X^\ast$ over the class of tensors with Tucker ranks $(r_1,r_2,r_3)$; $ฮบ^2$ quantifies the noise level; and the variance term $ฮบ^2 \{r_{1}r_{2}r_{3}+\sum_{k=1}^{3} p_{k} r_{k}\}$ scales with the effective number of free parameters in the estimator $\widetilde{X}$. Our analysis achieves a clean rank-adaptive bias--variance tradeoff: as we increase the ranks of estimator $\widetilde{X}$, the bias $ฮพ(r_{1},r_{2},r_{3})$ decreases and the variance increases. As a byproduct we also obtain a convenient bias-variance decomposition for the vanilla low-rank SVD matrix estimators.


Supplementary Material: Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem A Proof of Lemma 1 Lemma 1 F orpw, Zq PC, the functions f pw, Zq " x M1

Neural Information Processing Systems

We need to prove the following two inequalities. Thus, the inequality ( 19) holds trivially. Note that f p w, Z q " x M In this section, we will show that the MIQP presented in ( 4) is at least as hard to solve as a 0 1 Quadratic Program. It should be noted that MIQP ( 4) is stated for a fixed X. The Mixed Integer Quadratic Program (MIQP) ( 4) is NP-hard. " 0. Other cases will be at least as difficult as this case.



Random Feature Models with Learnable Activation Functions

arXiv.org Artificial Intelligence

Current random feature models typically rely on fixed activation functions, limiting their ability to capture diverse patterns in data. To address this, we introduce the Random Feature model with Learnable Activation Functions (RFLAF), a novel model that significantly enhances the expressivity and interpretability of traditional random feature (RF) models. We begin by studying the RF model with a single radial basis function, where we discover a new kernel and provide the first theoretical analysis on it. By integrating the basis functions with learnable weights, we show that RFLAF can represent a broad class of random feature models whose activation functions belong in $C_c(\mathbb{R})$. Theoretically, we prove that the model requires only about twice the parameter number compared to a traditional RF model to achieve the significant leap in expressivity. Experimentally, RFLAF demonstrates two key advantages: (1) it performs better across various tasks compared to traditional RF model with the same number of parameters, and (2) the optimized weights offer interpretability, as the learned activation function can be directly inferred from these weights. Our model paves the way for developing more expressive and interpretable frameworks within random feature models.


Reviews: Computationally and statistically efficient learning of causal Bayes nets using path queries

Neural Information Processing Systems

This paper gives algorithms for recovering the structure of causal Bayesian networks. The main focus is on using path queries, that is asking whether a direct path exists between two nodes. Unlike with descendant queries, with path queries one could only hope to recover the transitive structure (an equivalence class of graphs). The main contribution here is to show that at least this can be done in polynomial time, while each query relies on interventions that require only a logarithmic number of samples. The author do this for discrete and sub-Gaussian random variables, show how the result can be patched up to recover the actual graph, and suggest specializations (rooted trees) and extensions (imperfect interventions).


Nested Deep Learning Model Towards A Foundation Model for Brain Signal Data

arXiv.org Machine Learning

Epilepsy affects over 50 million people globally, with EEG/MEG-based spike detection playing a crucial role in diagnosis and treatment. Manual spike identification is time-consuming and requires specialized training, limiting the number of professionals available to analyze EEG/MEG data. To address this, various algorithmic approaches have been developed. However, current methods face challenges in handling varying channel configurations and in identifying the specific channels where spikes originate. This paper introduces a novel Nested Deep Learning (NDL) framework designed to overcome these limitations. NDL applies a weighted combination of signals across all channels, ensuring adaptability to different channel setups, and allows clinicians to identify key channels more accurately. Through theoretical analysis and empirical validation on real EEG/MEG datasets, NDL demonstrates superior accuracy in spike detection and channel localization compared to traditional methods. The results show that NDL improves prediction accuracy, supports cross-modality data integration, and can be fine-tuned for various neurophysiological applications.


Environment Invariant Linear Least Squares

arXiv.org Machine Learning

This paper considers a multi-environment linear regression model in which data from multiple experimental settings are collected. The joint distribution of the response variable and covariates may vary across different environments, yet the conditional expectations of $y$ given the unknown set of important variables are invariant. Such a statistical model is related to the problem of endogeneity, causal inference, and transfer learning. The motivation behind it is illustrated by how the goals of prediction and attribution are inherent in estimating the true parameter and the important variable set. We construct a novel environment invariant linear least squares (EILLS) objective function, a multi-environment version of linear least-squares regression that leverages the above conditional expectation invariance structure and heterogeneity among different environments to determine the true parameter. Our proposed method is applicable without any additional structural knowledge and can identify the true parameter under a near-minimal identification condition. We establish non-asymptotic $\ell_2$ error bounds on the estimation error for the EILLS estimator in the presence of spurious variables. Moreover, we further show that the $\ell_0$ penalized EILLS estimator can achieve variable selection consistency in high-dimensional regimes. These non-asymptotic results demonstrate the sample efficiency of the EILLS estimator and its capability to circumvent the curse of endogeneity in an algorithmic manner without any prior structural knowledge. To the best of our knowledge, this paper is the first to realize statistically efficient invariance learning in the general linear model.


Outlier-robust Estimation of a Sparse Linear Model Using Invexity

arXiv.org Artificial Intelligence

In this paper, we study problem of estimating a sparse regression vector with correct support in the presence of outlier samples. The inconsistency of lasso-type methods is well known in this scenario. We propose a combinatorial version of outlier-robust lasso which also identifies clean samples. Subsequently, we use these clean samples to make a good estimation. We also provide a novel invex relaxation for the combinatorial problem and provide provable theoretical guarantees for this relaxation. Finally, we conduct experiments to validate our theory and compare our results against standard lasso.


Learning and Optimization with Seasonal Patterns

arXiv.org Machine Learning

Online learning, or more specifically, the multi-armed bandit (MAB) problem, focuses on the task of learning the reward distributions from an unknown environment while simultaneously optimizing cumulative rewards over a fixed time horizon T. This problem has been studied extensively when the environment (i.e., reward distributions) is stationary over time, with numerous algorithms proposed to tackle the tradeoff between exploration and exploitation when making decisions (see Bubeck et al. 2012 for a comprehensive review). While the stationarity assumption about the reward distributions greatly simplifies the analysis, it does not hold in many decision problems in OR/MS and other fields when the environment is time-varying. For example, a fashion retailer should take into account the seasonal demand shift when setting the prices for apparels, and a hospital needs to consider the variation of the patient arrival rate when scheduling the medical staff. Despite the practical relevance, it is difficult to develop a learning policy for non-stationary rewards, especially when the dynamics can change arbitrarily over time. Recent studies (Besbes et al., 2015) have considered cases in which the environment does not change fast with respect to the length of the time horizon, e.g., when a budget sublinear in T is imposed on the total variation of the underlying reward distribution.